Iterative Human Coding and Computational Text Analysis: Assessing the Effects of Public Pressure on Policy

Devin Judge-Lord | Harvard University |


Human coding and computational text analysis are more powerful when combined. I describe a suite of exact methods that can increase the power of common hand-coding tasks. Human coding can both inform and be aided by rule-based information extraction—iteratively structuring queries on unstructured text.

  1. Text analysis tools can strategically select texts for human coders—texts representing larger samples and outlier texts of high inferential value.
  2. Preprocessing can speed up hand-coding by extracting features like names and key sentences.
  3. Humans and computers can iteratively tag entities using regex tables (e.g., identify organizations) and group texts by key features (e.g., identify lobbying coalitions by common policy demands)

Applying this method to public comments on U.S. federal agency rules, a sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought. This large sample enables new analyses of lobbying coalitions, social movements, and policy change.


Hand-coding dynamic data

Workflow: googlesheets4 allows analysis and improving data in real-time. For example, in Fig. 1:

  • The “org_name” column is populated with a guess from automated methods. As humans identify new organizations and aliases, other documents with the same entity strings are auto-coded to match human coding.
  • As humans identify each organization’s policy “ask,” other texts with the same ask are put in the same coalition.
  • If the organization and coalition become known, it no longer needs hand coding.

Fig. 1: Coded Comments in a Google Sheet

Regex tables to tag entities

  • Deductive: Start with databases of known entities.
Table 1: Regex Table Deduced from Center for Responsive Politics Lobbying Data
Entity Pattern
3M Co 3M Co|3M Cogent|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems
Teamsters Union Brotherhood of Locomotive Engineers & Trainmen|Brotherhood of Maint of Way Employ Div|New England Teamsters & Trucking Pension|Teamsters Airline Express Delivery Div|Teamsters Local 357|Teamsters Union|Western Conf of Teamsters Pension Trust
  • Inductive: Add entities that frequently appear in the data to regex tables.
  • Iterative: Add to regex tables as humans identify new entities or new aliases for known entities. Update data (Google Sheets) to speed hand coding.

Fig 2: Iteratively Building Regex Tables

For example, the legislators package adds variants (e.g., “AOC”) to standard legislator names.


Results: Who mobilizes public comments?

Of 58 million public comments on proposed agency rules, the top 100 organizations mobilized 43,938,811. The top ten organizations mobilized 25,947,612.

Table 2: The Top 5 Organizations Mobilized 20 Million Public Comments
Organization Rules Lobbied On Pressure Campaigns Percent (Campaigns /Rules) Comments Average per Campaign
NRDC 530 62 11.7% 5,939,264 95,795
Sierra Club 591 110 18.6% 5,111,922 46,472
CREDO 90 41 45.6% 3,019,150 73,638
Environmental Defense Fund 111 31 27.9% 2,849,517 91,920
Center For Biological Diversity 572 86 15.0% 2,815,509 32,738
Earthjustice 235 59 25.1% 2,080,583 35,264

Grouping with text reuse

Fig. 3: Iteratively Group Documents

FIg 4: Identifying Groups of Linked Documents with Text Reuse (a 10-gram Window Function)

  • Comment A shares no 10-grams with the others
  • B, C, and D share some text (they are part of an organized mass comment campaign)
  • E and F are the same text that was submitted twice

Results: Most public comments result from organized pressure campaigns

Fig. 5: Public Comments on Regulations.gov, 2005-2020

Comments that share a 10-gram with 99 or more others are part of a mass comment campaign.


Grouping with key phrases

  1. Humans identify groups of selected documents (e.g., lobbying coalitions)
  2. Humans copy and paste key phrases
  3. Computer puts other documents containing those phrases in the same group (coalition)

Preprocessing tip: Summaries speed hand-coding (e.g., use textrank to select representative sentences).


Results: Coalition size and coalition success

Fig. 6: Lobbying Success by Campaign Size

Public pressure on climate and environmental justice greatly affected policy documents (Fig. 7), but a few organizations dominate lobbying coalitions (Table 2). When tribal governments or local groups lobby without the support of national advocacy organizations, policymakers typically ignore them.

Fig. 7: Policy Text Change by Coalition Size

Next steps

  • Compare exact entity linking (regex tables) to probabilistic methods (linkit, fastlink, ML with hand-coded training set)
  • Compare exact grouping (e.g., by policy demands) to supervised probabilistic classifiers/clustering